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A fomial definition of e-complexity of an individual continuous function 
defined on a unit cube is proposed. This definition is consistent with the Kol- 
mogorov's idea of the complexity of an object. A definition of e-complexity 
for a class of continuous functions with a given modulus of continuity is 
also proposed. Additionally, an explicit formula for the e-complexity of a 
functional class is obtained. As a consequence, the paper finds that the e- 
complexity for the Holder class of functions can be characterized by a pair 
of real numbers. Based on these results the papers formulates a conjec- 
ture concerning the e-complexity of an individual function from the Holder 
class. We also propose a conjecture about characterization of e-complexity 
of a function from the Holder class given on a discrete grid. 
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1 Introduction 

The concept of "complexity" of an object is one of the fundamental scientific 
paradigms. There are numerous efforts in the literature to define the complexity 
properly. There are many attempts to apply it in practice as well. 

One of the first efforts to provide the quantitative approach to the concept of 
"complexity of a physical system" was made in 1870s by an Austrian physicist 
Ludwig Boltzmann who had introduced the notion of entropy in equilibrium sta- 
tistical physics. The greater the entropy, the more "complicated" the system is. 

In 1940s Claude Shannon developed the information theory using the concept 
of entropy of a probability distribution. He interpreted the entropy as a measure 
of the "degree of uncertainty" which is peculiar to a particular probability dis- 
tribution. Under natural conditions this measure was proven to be unique. It is 
known that the number of "typical" trajectories of a stationary ergodic random 
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sequence with the sample size n can be expressed by the formula ~ ex'p(nHs), 
when n — oo. Here, Hs is the Shannon entropy of the underlying distribution 
(see details in, e.g. [[T]|). Hence, the higher the entropy the more "complicated" 
the system is. 

Kolmogorov and Sinai (see, e. g., |I2||) introduced the concept of entropy in 
the theory of dynamical systems. In fact, their definition was a generalization 
of the Shannon entropy. The dynamical system's entropy is determined by the 
large-time asymptotic behavior of the coefficient appearing in the logarithm of 
the number of different types of trajectories of a dynamical system. Again, the 
entropy of a dynamical system may serve as a measure of its "complexity": the 
more "complex" the system, the richer the variety of its trajectories. 

Since functions are some of the most basic mathematical objects, the question 
of how to define complexity of a continuous function is quite natural. It is also 
important for practical applications. In particular, the quantitative characterization 
of complexity of a continuous function could be used to solve the problem of data 
segmentation. Consider, for example, the time series generated by different and 
unknown mechanisms (either stochastic, or deterministic; we shall call such data 
non-homogeneous). To analyze and model non-homogeneous data it is necessary 
to perform their segmentation first. 

In order to estimate complexity of a continuous function, one can try to use 
the Shannon entropy approach, but from our point of view, this approach is not 
suitable. Indeed, let us consider the function x{t) = t, t E [—1, 1]. Obviously, 
the distribution of the values of this function is uniform on the interval [—1,1] . 
Therefore, formally calculated. Shannon entropy for a discrete distribution of the 
function's values on a uniform grid is maximal. Hence, the complexity is maximal 
if it is measured by the entropy. But, in fact, a straight line is a very simple object, 
which is completely defined by two points. 

From another point of view, using the concept of entropy of a dynamical sys- 
tem is also inappropriate if one wants to estimate the complexity of a continuous 
function. Indeed, in the modem theory of dynamic systems it is assumed that 
their law of evolution does not change over time. However, non-autonomous or- 
dinary differential equations do not satisfy this condition, and estimation of the 
complexity of continuous functions generated by these equations is not covered 
by the theory of dynamical systems. Moreover, not every continuous function is 
generated by a dynamical system. 

So, to the best of our knowledge, the existing complexity theory provides no 
satisfactory method to estimate the complexity of a continuous function. More- 
over, from our point of view it is essential that the proper definition of the com- 
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plexity of a function should not depend the mechanism generating a function. 

In the middle of 1960s, Kolmogorov suggested an algorithmic approach to 
the notion of object's complexity. The main idea of this approach (see [|3l) is as 
follows: A "complex" object requires a lot of information for its reconstruction 
and, for a "simple " object, little information is needed. He formalized this idea in 
the language of the theory of algorithms. In particular, the algorithmic complexity 
measures the length of the program which leads to the selection of a particular 
object from a set of objects. This approach is closest to our definition of the 
complexity of a continuous function. 

The first-named author, see Darkhovsky 0), proposed to measure the e- 
complexity of a continuous function by the number of its values (given on uniform 
grid) which are required to its reconstruction by fixed family of approximation 
methods with a given marginal error e. This approach was successfully pre-tested 
on the human electroencephalographic data [|4j . 

In this paper, we further develop and modify this concept. The main result of 
this paper provides an effective characterization of the e-complexity for a class 
of continuous functions given on a unit cube in the finite dimensional Euclidean 
space. Specifically, we prove that, for Holder class functions, there exists an qffine 
relationship between the e-complexity and the logarithm of the function recon- 
struction error e. In other words, the e-complexity of the Holder class functions 
can be characterized by a pair of real numbers. 

The above result, leads us to formulate the following conjecture: The e- 
complexity of an individual function from the Holder class also has, in logarithmic 
coordinates, an affine dependence on e, and also can be characterized by a pair 
of real numbers. 

This conjecture is supported by preliminary simulations. 

The paper is organized as follows. In Section 2, we propose a definition of e- 
complexity of a continuous function given on a unite cube in the finite dimensional 
Euclidean space. In Section 3, we give a definition of the e-complexity for a class 
of functions with a fixed modulus of continuity, and prove the theorem regard- 
ing the e-complexity of a functional class. The corollary of this theorem gives 
a characterization of the e-complexity for the Holder class of functions. In this 
section, we also formulate the conjecture which characterized the e-complexity of 
an individual function from the Holder class. 

In Section 4, we introduce a definition of the e-complexity of an individual 
continuous function given by its values on a discrete grid. In Section 5, we dis- 
cuss the computational aspects of e-complexity 's evaluation, and formulate our 
basic conjecture. This conjecture gives a numerical characterization of the e- 
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complexity. Finally, conclusions are provided in Section 6. 



2 Complexity of an individual function 

Without loss of generality we assume that a continuous function x{t) is defined on 
a unit cube I in the space M'^ . On the set of such functions we introduce a norm 
II • II . To be able to compare complexity of different functions, it is reasonable to 
assume that ||x(t)||=l, (i.e., essentially, to consider x(t)/||x(t)|| instead of x{t)). 

Let Z/i be a /c-dimensional grid with spacing h, and Ih = iCiZh- Assume that 
we only know the values of x{t) at points of the set I^. Given this information, 
with what precision can we reconstruct the function 

Suppose we have a fixed set of approximation methods J-' of functions with 
values given only at the points of Ih- Let x{t) be an approximation which is 
constructed using one of the allowable methods of approximation. Consider the 
approximation error 



where the infimum is taken over the whole set T . 

It is clear that the function 5{h) is nondecreasing: the increase of the grid 
spacing means that we discard more and more information about the function 
values. If we fix a certain "acceptable" (user-specified) error level, e > 0, then 
we can determine the fraction of the function values that could be discarded while 
still permitting reconstruction of the original function (again, via the fixed family 
J-' of approximation methods) with error not exceeding e. Note that, in general, 
the approximation error should be related to the norm of the function but, since 
we assume that the function is normalized, 6{h) really measures the relative error. 



Hence, h* (e) is the minimum grid spacing guaranteeing that the error of the func- 
tion reconstruction from its values on the grid exceeds a given e. 

The value {l/h*{e))'' estimates the number of points in the set Ih*{e) that must 
be retained to achieve a given approximation error, and it is natural to use the 
quantity l/h*(e) to define a measure of function complexity. There is some flexi- 
bility here since as a quantitative measure of the e-complexity we can employ any 
monotonically increasing function of \/h*{e). However, as we shall see below. 



5{h) = inf ||x(t) -x(t)||, 



Let 




inf{/i < 1 : 6ih) > e}, if {h : 5{h) > e} 7^ 
1, if the set is empty 



(1) 
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use of the logarithmic function enables us to get a particularly effective character- 
ization of the complexity. Thus we introduce the following definition: 

Definition 1. The number 

^(6,^,||.||)t/5(6)=l0g^ 

is called (e, J-", || ■ \\)-complexity (or, briefly, e-complexity) of an individual function 
x{t). 

In other words, e-complexity of a continuous function on a segment is the 
(logarithmic) fraction of the function values that must be retained to reconstruct 
the function via a certain fixed family of approximation methods with a given 
error. 

Note that e-complexity is a continuous functional on the space of continuous 
functions equipped with the norm which was used to define the approximation 
error. 

It is natural to assume that F contains at least the method of approximation of 
functions via affine functions of the form at + h. In this case, if x{t) itself is an 
affine function on l'^, then its error-free recovery requires knowledge of (/c + 1) of 
its values on linearly independent points. But ^{Ih) > + 1) for any < /i < 1. 
Therefore, according to Definition 1, for any affine function we have /i*(0) = 1, 
and its 0-complexity 5(0) is equal to zero. 

Note, that the proposed measure of complexity is an individual characteristic 
of a particular function, rather than of a set of functions generated by a certain 
mechanism (as is the case of the entropy of a dynamical system). Furthermore, 
this measure does not depend on the mechanisms generating the function. It is 
insensitive to whether the function is a sample path of a random field/process, or 
a trajectory of a dynamical system. 



3 Complexity of a functional class 

Let C be the space of continuous functions with the standard norm, ||a;(-)||c 
max \x{t) I . Denote by 

^x{h) = max \x{t) — x{s)\ 

{t,s)&,\\t-s\\<h ' ^ ^ ^ ^' 
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the modulus of continuity of the function x{t). It is well known that the function 
uj{-) is continuous and non-decreasing. 

Let U C be an arbitrary bounded set in the class of all functions with 

def 

a given modulus of continuity uj{-), and let R = sup We define the 

x{-)eu 

e-complexity, ^^(e, to), of the set U as follows: 
Definition 2. The number 

5,^(e,a;) = ilog 

where h{e) is the the grid spacing such that the maximum (over the set U C X^} 
error of the optimal function reconstruction using its values on the grid does not 
exceed e, is called e-complexity of the set U C X^. 

Thus, to estimate we have to find the minimum of the maximal (over all 
functions from U C X^^) error of the function reconstruction from a given class 
using its values on the grid with spacing h (we call the corresponding error the 
minimax reconstruction error). 

Remark 3.1. For any individual function it is natural to calculate the relative 
error of the function reconstruction, i.e. the error which is scaled to the norm of 
the function. But, for any bounded set from the class of continuous functions, 
the relative reconstruction error for the class must be calculated as the ratio of 
the minimax reconstruction error to the maximal norm of the functions from the 
given set. Therefore, to calculate we have to consider the absolute minimax 
reconstruction error. ■ 

Theorem 3.1. Let us assume that the reconstruction error is measured in the 
uniform norm \\ ■ \\c, and that the modulus of continuity uj{-) has the inverse (i.e., 
it id strictly increasing). Then, the complexity of any bounded set U C 
is expressed by the following relationship: 

Remark 3.2. If uj{-) is not strictly increasing then its inverse, u~^{-), in the 
formula (2) should be replaced by the generalized inverse mm{h : uj(^h) = e} ■ 
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Proof. To prove the theorem, we have to calculate a minimax reconstruction error 
5ci{h) for any given grid size h. Since, we consider the norm || ■ ||c it is sufficient 
to find the value 5ci{h) only for one cell from Ih. 

Let t° G I/i, i° = (^1, • • • , tk), ej be a A;-dimensional vector, whose compo- 
nents represent an arbitrary set of zeros and ones, i.e. ei = (0, 0, . . . , 0), . . . , = 
(1, 1, ... , 1) (obviously, the number m of such components is equal to 2^). Con- 
sider the values of some function x{t) from U C on a single cell of Ih, i.e., 

on the set =^ {x(f f = t° + hci, t° G Ih, and pose the problem of 
estimating the value of the function at an arbitrary point r inside the cell. In other 
words, we have to we have to solve the problem 

sup sup \x{t) — u\) — )■ inf (3) 

{x(Q),x{h)} x{t)&U 

where the internal supremum is taken over all the values of x{t) G U, and the 
external supremum is taken over all admissible (in U) values 

Denote hy u = Lpij) the value of the optimization problem (3). By definition, 
the norm of ^{t) is equal to the minimax reconstruction error 5ci{h). Let rj : = 
||r — f II, 1 = 1, . . . , m. Then the set of all possible values of x(r) G U, given the 
fixed collection is equal to the segment 

m 

and the solution of the optimization problem (3) under the same conditions (i.e., 
the minimax estimate of the function value at point r under a given fixed collection 
of admissible (in U) values ) is the midpoint of this segment, and the error 

of the approximation is equal to half the length of D. 

It is easy to see that the length of D is maximal if x{f) = a = const, i = 
1, . . . , m. Then the optimal selection in (3) is Uopt = o-, and the value ip^r) = 
min a;(||r — f II) does not depend on a. 

l<i<m 

To calculate the norm (in the space of continuous functions) of the minimax 
recovery error it is necessary to find the "worst" point r in the cell, that is, a point 
where the function </?(r) reaches its maximum. Since Ci;(-) is a monotonically 
increasing function, it is necessary to find a point r inside of the cell such that 
the minimum of the distances from this point to the vertices of the cell will be the 
highest. It is easy to see that such a point is the center of the cell t*. Obviously, 
||r*-f II = y/kh/2. Therefore, ||<^(-)||c = ^(r*) = uj{Vkh/2) = 5d{h). Finally, 
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to find h{e) from Definition 2 we have to solve the equation uj{-\/kh/2) = e. The 
solution is h{e) = 2u~^{e) / \fk which concludes the proof of the Theorem. □ 

Corrolary 3.1. The e-complexity of any bounded subset U of the Holder class 
functions is given by the formula 

S^i{e,H) = A + B\oge, (4) 

for some values of the coefficients A and B. 

Proof By definition, for the Holder class functions uj{h) = LhF. Therefore, 

a;-i(e) = {^flP and we get (4) from (2). □ 

Let Xo(-) be an individual function from the Holder class Xh- Consider the 
set U = {x(-) e Xh ■■ \\ •^i')\\c — ll'^o(') II Then, in the case of a sufficiently 
rich set J-' of approximation methods, the e-complexity of an individual function 
Xo(-) should be smaller than the e-complexity of the corresponding set, i.e., < 
^^(e, H) < S^i{e, H). This fact justifies the following conjecture: 

Conjecture 1. The e-complexity of an individual function from the Holder class 
satisfies (4) for some values of the coefficients A and B. 

Remark 3.3. It can be shown that relation of type (4) also holds if the error is 
measured in the norm of the space Lp. ■ 



4 Complexity of a continuous function given on a 
discrete grid 

In the majority of applications, we deal with functions given by their values at 
a discrete set of points (i.e., by a finite sample). We still assume that this set of 
values is the trace of a continuous function on a lattice in the unit cube of the 
/c-dimensional Euclidean space. Let us consider how the definition of complexity 
has to be adjusted to this situation. 

Let A^'^ be the number of values of the continuous function x{t) on the k- 
dimensional lattice of the unit cube. Consider the quantity h*{e) introduced in (1) 
and suppose that \h*{e)NY ^ 1- It is easy to see that we can discard \h*{e)NY 
function values from each fc-dimensional cube with the size h*{e), and the re- 
construction error will be less or equal e. In other words, the number of values 
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sufficient for tlie function reconstruction with a relative error not exceeding e is 
equal to n* = [N'' /[h*{e)N]'']. 

Hence, in accordance with the general idea of section 2.1, the e-complexity is 
a logarithmic fraction of n* and we can formulate the following definition 

Definition 3. The value 

is called e-complexity of the individual function x{t), given by the set of its discrete 
values. 

The next result follows directly from (5). 
Theorem 4.1. 

lim S'jv(e) = S{e) 

N^oo 

The growths of means the growths in the sampling frequency if the function 
domain is fixed. Therefore, in the case of sufficiently high sampling frequency of 
the function, the e-complexity of the sample calculated over the discrete set of 
values is not very different from the true e-complexity. 

Of course, the question arises what should be the sampling frequency to make 
this difference is quite small, but if we are dealing with the data obtained with the 
same sampling frequency , this question is not essential. In any case we must bear 
in mind that the comparison of functions in the case of discrete set of values can 
be performed only when the sampling frequency is the same. 

Given the above, we can formulate the conjecture that for the Holder class 
functions (compare with (4)) the following equality should be true 

S^{e)=A + B\oge (6) 



5 Estimation of the complexity coefficients. Basic 
Conjecture. 

When processing real data, we usually have to deal with functions defined by their 
values in a discrete set of points. Therefore the algorithm to estimate complexity 
is focused on this situation. 

Suppose we are given an array of size of function values. Let us choose a 
number < S < 1, and discard from the array [(1 — ^)N] values. In the next step 
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we use the remaining [SA^] values to approximate the values of the function for 
all discarded points using a collection T of approximation methods, and find the 
best approximation (the approximation with the smallest error). 

Two factors have to be taken into account. First, the remaining points should 
be distributed relatively uniformly. Second, since the error of the approximation 
depends on the location of the remaining points, for the sake of the stability of the 
method it is expedient, for a given percentage of removed points, to choose dif- 
ferent selection schemes and average the corresponding minimal approximation 
errors over them. This will allow us to smooth out the unavoidable random errors 
in the calculations. 

Thus, for given values of S we determined the value of minimal error e of 
the function recovery. It is obvious that for any § > the error of the function 
recovery tends to zero as — )■ oo (we always assume that the grid is uniform). On 
the other hand, if the sample size N is too small, then estimation of the recovery 
error will be affected by calculations errors even for values of § close to 1. 

For this reason and based on the previous one (see (4), (6)), we can state the 
following basic conjecture: 

Conjecture 2. For any function from the Holder class given by its discrete values, 
we can specify a sample size N of the data such that with this size there exists 
an interval [a; < a < S < /3 < 1. Within this interval the following 
relationship holds: 

log e = A + B log §, (7) 

where e is the minimal error of the function recovery by given set of reconstruction 
methods. 

Let us explain the relationship (7). According to the definition, the e- 
complexity is a logarithm of the number of function values needed to reconstruct 
the function with the error e. Therefore, according to (4) and (6) we take log only 
for e. In the case of discrete data we deal with the value § and analogy of the 
e-complexity in that case is log §. 

Our preliminary results of computational experiments show that the relation- 
ship (7) holds fairly well. The description of the computational experiments and 
simulations are in preparation and will be presented in a separate publication. 

Remark 5.1. It follows from the main hypothesis that there exists a correspon- 
dence between any Holder function and the parameters (A, B) of its e-complexity. 
But this correspondence is not one-to-one. Thereupon there is a question whether 
it is possible to distinguish between the functions with the nearest parameters 
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(A, B)? It is useful to consider the discrete analogues of the derivatives (i.e., the 
corresponding differences of the order i, i = l,...,s). These analogues can be 
obtained from the initial sample and then for these differences it is necessary to 
find complexity parameters {A, B} -'^j^. These additional parameters will improve 
distinguishability of functions with the close complexity parameters. ■ 

6 Conclusion 

In this paper we proposed a formal definition of the e-complexity of a continuous 
function defined on a unite cube in a finite-dimensional space. This definition is 
agreed with the idea of Kolmogorov complexity of objects. Roughly speaking, 
the e-complexity of a continuous function can be estimated by the fraction of the 
function values which is required to reconstruct the function with given error e 
and with given set of approximation methods. 

We show that the e-complexity has an effective characterization, due to the de- 
tected qffine dependance: the e-complexity of an individual function of the Holder 
class can be characterized by the pair of real numbers which we called the com- 
plexity coefficients. 

It has potential to be used for the problem of segmentation of time series and 
classification problem. All known methods of non-homogeneous data segmenta- 
tion are based on information about changing probabilistic distributions (in case 
of probabilistic generating mechanisms) or models of generating mechanisms (in 
case deterministic or mixed mechanisms). If the time series is generated by differ- 
ent mechanisms (either probabilistic, or deterministic) in different time intervals, 
complexity coefficients can be used as "intemal" characteristics of the function. 
Therefore it will enable us to detect changes of data generating mechanisms using 
only the "internal" characteristics of a function (i.e., e-complexity). 
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